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ABSTRACT 

Machine Learning (ML) is a specific type of artificial intelligence that allows systems to learn from 
data and detect patterns without much human intervention. As technology expands, machine learning 
provides an exciting opportunity in the health sector to improve the accuracy of diagnoses, personalise 
health care. Cancer has been characterised as a heterogeneous disease consisting of many different 
subtypes. In cancer research, it has become a necessity to diagnose and prognosis the type of cancer 
as early as it can facilitate the subsequent clinical management of patients. To accomplish the 
detection of cancer properly there are certain techniques in Machine learning which have been widely 
applied in cancer research for the development of predictive models, resulting in effective and 
accurate decision making. SVM, KNN, DT, LR, CNN, ANN, RF, MLP etc are such types of 
techniques in ML to model the progression and treatment of cancerous conditions. Apart from ML 
techniques there are certain image processing tools that are being introduced to detect cancer. Getting 
a Clear cut classification from a biopsy image is an inconvenient task as the pathologist must know 
the detailed features of a normal and the affected cells. In this paper we are going to review the 
effectiveness of these various kinds of ML approaches to detect different types of cancer in some 
previous research papers which have already been done. The analyses and assessment techniques of 
the selected papers are discussed and an appraisal of the findings presented to conclude the article. 
Keywords: Machine Learning, Cancer, Review, Biopsy 


INTRODUCTION 

Cancer is an irregular extension of cells and one of the most serious health problems in the human 
body. It is a genetic disease caused by changes to genes. It may take any form and is very difficult to 
detect during early stages. Among all cancer types, the mortality rate of lung cancer is highest. Other 
types of cancer are: Gastric Cancer, Liver Cancer, Carcinoma, Sarcoma, Leukaemia, Lymphoma, 
Multiple Myeloma. 

In cancer detection, it often involves radiological imaging. Radiological Imaging is a process which 
is used to check the spread of cancer, meaning the affected areas and progress of the treatment and 
another use of this process is to monitor cancer. Oncological Imaging is more accurate. 

In cancer research and oncology, the successful application of Machine Learning (ML) and Deep 
learning (DL) techniques has recently demonstrated fundamental improvements in image-based 
disease diagnosis and detection. ML and DL frameworks have been also applied towards cancer 
diagnosis, classification and treatment by exploiting genomic profiles and phenotype data. 

In this review paper we focus on the ML aspect of Artificial Intelligence based applications in cancer 
research. 


Various Machine Learning Techniques to detect cancer 
CNN: In Image Processing, the image is taken as input and loaded into the program. Then, the image 
is divided into 12 segments and CNN(Convolution Neural Networks) is applied in each segment. 
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CNN is used to extract the amount of each type of cancer cell present in each segment. After the 
segmentation it takes the average of 12 parts and that output will be stored to another file which acts 
as intermediate output. 1][2]. 


ANN: Artificial Neural Network(ANN) is the theoretical mathematical model of the human neural 
network and also its called an information processing system based on the structure and function of 
the neural network. It is becoming more and more useful and their application fields are also 
expanding[3][4]. 


KNN: The K-nearest Neighbors Network(KNN) algorithm is one of the simplest and most used 
machine learning algorithms and a parameterized learning method function like Education distance 
or Manhattan distance and the majority class of the k-nearest data points to the data point in question 
is used to build the model[5][6] 


SVM: Support Vector Machine(SVM) is one of the most popular supervised learning algorithms 
which can be utilised for classification, regression, and outlier detection. The goal of the SVM 
algorithm is to create the best line or decision boundary that can segregate n-dimensional space into 
classes so that we can easily put the new data point in the correct category in the future[7][8]. 


DT: Decision Tree(DT) is a type of supervised learning technique which has been used for both 
classification and regression problems, but mostly it is preferred for solving classification problems. 
It is a tree structured classifier that provides a nonparametric method for partitioning datasets. It has 
a hierarchical tree structure, which consists of a root node, branches, internal nodes and leaf 
nodes[9][10]. 


LR: Logistic Regression(LR) is one of the widely used supervised learning methods in machine 
learning which is used to predict the probability of a target variable. It basically predicts the output 
of a categorical dependent variable using a given set of independent variables.That is why it is the go 
to method for binary classification problems[11][12]. 


NB: Naive Bayes algorithm is a supervised learning technique that works on the principle of Bayes 
theorem and is utilised in a wide variety of classification tasks. It is a probabilistic classifier which 
predicts on the basis of the probability of an object and one of the most effective classification 
algorithms. NB is mainly used in text classification which includes a high-dimensional training 
dataset[13][14]. 


RF: Random Forest(RF) is a classifier that belongs to supervised learning in ML. It works on the 
principle of ensemble learning. RF basically contains a number of decision trees on various subsets 
of the given datasets and takes the average to improve the predictive accuracy of that dataset. For 
classification tasks, the output of the random forest is the class selected by most trees. Random 
decision forests correct for decision trees’ habit of overfitting to their training set[15][16]. 


MLP: Multilayer Perceptron(MLP) is a special kind of feed-forward network which is widely used 
in neural networks in machine learning. It consists of three layers- the input layer, output layer, hidden 
layer. An MLP is characterised by several layers of input nodes connected as a directed graph between 
the input and output layers. MLP uses backpropagation for training the network[17][18]. 
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Tablel : Comprehensive analysis of various Machine Learning algorithms to detect Breast Cancer 


Author & Year Data Source ML Accuracy 
technique | (%) 


(Alireza Osareh , Bita | The dataset of fine needle aspirate | SVM 98.00 
Shadgar, 2010) of breast lesions (dataset I), a| KNN 96.00 
second dataset comprised of gene | PNN 97.00 
microarrays comes from reference 
[19] and [20]. 


] 
(Meriem Amrane, Ikram | UCI Machine Learning | KNN 97.50 
Gagaoua, 2018) Repository NB 96.10 
(Wenbin Yue, Hongwei | WBCD Database DT 93.60 
Chen, 2018) ANN 91.20 


(Ebrahimi M, , Eshlaghy | ICBC dataset in the National | DT 93.60 
AT, 2013) Cancer Institute of Tehran for the | ANN 94.70 
years 1997-2008 SVM 95.70 


(Wang et al. , 2020) RBF SVM | 97.66 


DT 
(Hiba Asri, Thomas Noel, | UCI Machine Learning | SVM 97.10 
2016) Repository NB 95.90 
KNN 95.20 


(Mitra Montazeri, Mahdieh | A database which included the | SVM 94.00 
Montazeri, 2016) information of 900 patients during | NB 95.00 
1999-2007 that was recorded by 
the Cancer Registry Organization 
of Kerman Province, in Iran 


(Omar Ibrahim Obaid, | UCI machine learning repository | SVM 98.10 
Salama A. Mostafa, 2018) KNN 96.70 
DT 93.70 


(David A. Omondiagbe, | UCI machine learning repository | SVM 96.40 
Amandeep S. Sidhu, 2019) NB 91.10 
(Jiande Wu, Chindo Hicks, | TGCA and GDC database SVM 90.00 
2021) KNN 87.00 


(Anji Reddy Vaka, Badal | M. G Cancer Hospital & Research | SVM 95.70 
Soni, 2020) Institute, Visakhapatnam, India NB 95.60 
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(Tolga Ensari, Ebru | UCI Machine Learning | SVM 96.90 
Aydindag Bayrak, 2019) Repository 95.40 
95.30 


(Shubham Sharma, | UCI Machine Learning repository | KNN 95.90 

Tanupriya Choudhury, NB 94.40 

2018) 

(Sadeq Darrab, Gunter | UCI Machine Learning | NB 71.60 

Saake, 2020) Repository SMO 69.60 
oa 


(Yixuan Li, Zixuan Chen, | BCCD and WBCD database 68.60 


2018) 65.70 
71.40 


cai 
(Abien Fred M. Agarap, | WDBC dataset 96.00 
2018) ay 96.20 


(Gunjan Chugh, The list of publicly available | SVM 90.00 
Singh, 2021) datasets used by the researchers in | CNN 83.10 
recent years 
DT 


(Habib Dhahri, is | UCI Machine Learning 97.60 
Mahmood, 2019) Repository NB 98.00 
96.90 


(Vikas Chaurasia, Saurabh | WHO “Data WHO Coronavirus 
Pal, 2020) Covid-19 cases and deaths-WHO- 
COVID19-global-data”. 


(Ch. Shravya, Shaik | UCI machine learning repository | SVM 
Subhani, 2019) 


Table2 : Comprehensive analysis of various Machine Learning Io to detect Lung Cancer 


Author & Year Data Source Accuracy 
ne (%) 


(Zhihua Cai, Dong Xu, | TCGC and GEO database 
2014) 


(Radhika P R, | UCI Machine Learning Repository 
Rakhi.A.S.Nair, 2019) 


(Syed Saba Raoof, M A. | UCI ML Database 
Jabbar, 2020) 
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(Ying Xie, Wei - Yu| Data collected from a group of | NB 
Meng, 2020) patients of Hubei Taihe Hospital SVM 
N 


N 
(Nikita Banerjee, | Sumathipala et al., [9] proposed a | ANN 96.00 (TB) 
Subhalaxmi Das, 2020) | model where the image data are | SVM 86.00 (RB) 
taken from LIDC-IDRI RFs 79.00 (RB) 


(Kwetishe Joro Danjuma, | Thoracic Surgery datasets obtained | MLP 
2015) from the University of California | J48 
Irvine machine learning repository | NB 


(Elias Dritsas, Maria | https://www.kaggle.com/datasets/ 

Trigka, 2022) mysarahmadbhat/lung-cancer ANN 
NB 
DT 
KNN 


(Muhammad Imran | UCI online repository Auto MLP 
Faisal, Saba Bashir, NB 
2018) SVM 

DT 

GBT 


(Qing Wu, Wenbing | https://www.kaggle.com/c/ data- | NN 77.80 
Zhao, 2017) science-bowl-2017 


(Gur Amrit Pal Singh, P. | Cancer Research UK (2017) | KNN 

K. Gupta, 2018) Cancer mortality for common | SVM 
cancers. 
http://www.cancerresearchuk.org/h | NB 
ealth-professional/cancer- 
statistics/mortality/common- 
cancers-compared 


(Radhanath Patra, 2020) | https://archive.ics.uci.edu/ml/datas 
et/Lung+cancer 


(Dakhaz Mustafa | UCI machine learning repository 
Abdullah, Adnan Mohsin 
Abdulazeez, 2021) 


(Amjad Rehman, Noor | Chest CT scan image dataset SVM 93.00 
Ayesha, 2021) KNN 91.00 
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Y S 


(S. Shanthi, N. Rajkumar, 
2020) 


(Marta Borowska, 
Ewelina Bebas, 2021) 


Thallam, 
Peruboyina, 


(Chinmayi 
Aarsha 
2020) 


(S. Baskar, P. Mohamed 
Shakeel, 2019) 


(Rashmee Kohad, Vijaya 
Ahire, 2015) 


GÜNAYDIN, 


(Özge 
Melike GUNAY, 2019) 


Kancherla, 
Mukkamala, 


(Kesav 
Srinivas 
2013) 


TCGA Database MRMR - DT | 83.33 


155 magnetic resonance images 
with a metabolic active lung tumour | NB 
according to the PET / MR scan 


Text data collected from data.world | SVM 


Computed Tomography (CT) | SVM 90.90 
images 


cancerimagingarchives.net, ACO_SVM_ |} 93.20 
Mahatma Gandhi Mission and | ACO ANN | 98.40 
Tapadia Diagnostic Center situated 

at Aurangabad 


Standard Digital Image Database, 
Japanese Society of Radiological | SVM 
Technology (JSRT) 


SVM 
RBF 

NB 

MLP 
Bagging 
RBF 
AdaBoost 
RBF 

RFs 
MLM 
SMO 
LLM 
AdaBoost 
LLM 
Bagging 
LLM 


Public Sample Collection 
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Y S 


Table3 : Comprehensive analysis of various Machine Learning a to detect Liver Cancer 


Author and Year Data Source Accuracy( 
aa %) 


(Amita Das, 
Acharya, 2019) 


U. Rajendra | Imaging centre of IMS and 

SUM Hospital, India n 
SVM 
KNN 
AdaBoost 
M1 


J48 


(Sangman Kim, 


Jung, 2014) 


Seungpyo 


(Samreen Naeem, 
Aqib Ali, 2020) 


(Somaya Hashem, Shahira 


Habashy, 2020) 


Raw data of serums from 
314 normal and 81 liver 
cancer patients 


Bahawal Victoria Hospital 
(BVH), Bahawalpur, 
Pakistan. 


Data were collected from 
two institutes in Egypt: the 
Egyptian National 


Committee for the Control 
of Viral Hepatitis and the 
multidisciplinary HCC 
clinic at Cairo University’s 
Kasr Al-Aini Hospital 


(Maria Alex Kuzhippallil, | UCI ML Repository KNN 
Carolyn Joseph, 2020) DT 
LR 
AdaBoost 


Indian Liver Patients’ 


Records dataset. 


(Elias Dritsas, 
2023) 


Maria Trigka, 


(Zhaoyang Cao, 2020) Bai et al’s research, which 
is the NGS data of the pre- 
S/S region of the HBV 
genome with 400-500 base 


pairs of nucleic acids 


(Bhawana Maurya, Reference number [77] to | SVM 
Manoj Kumar,2020) [82] NN 
RF 
PSO-SFS- 
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SBS 
PSO-SFS 


(Suruchi Fialoke, Anders | American Medical 


Malarstig, 2018) Informatics Association 


(Enagandula Prasad, V. Durga | Data source not found 
Prasad Jasti, 2022) 


Table4 : Comprehensive analysis of various Machine Learning algorithms to detect Skin Cancer 


Author and Year ML technique A 
(Kinnor Das, Anant Patil, oe and | InceptionV4 =| 76.50 
2021) oe CNN 85.30 
(Vrunda Shah, Manan Shah, | PH2;ISIC KNN 85.00 
2022) SVM 78.00 
(Mahmoud Elgamal, 2013) Data source not found ANN 95.00 
KNN 97.50 
(Carolina Magalhaes, Joao | Data Source Not Found SVM 92.60 
Manuel R.S. Tavares , 2021) CNN 91.00 
Zhoufeng Zhang SVM 87.40 
Junle Qu CNN 88.90 


(Yuheng Wang, Harvey | Skin Care Centre of |SVM 
Lui,2021) Vancouver General | KNN 
Hospital RF 


(Titus Josef Brinker, Achim | ISIC 2018, HAM10000 SVM 93.60 
Heckler, 2018) CNN 79.50 


(Afsana Ahsan Jeny, Masum | Collected the data from 

Shah Junayed, 2020) some hospitals and flocked ONN 
together some pictures | Inception V4 
available online. 


(Shunichi Jinnai Department Dermatologic | FRCNN 
Ryuji Hamamoto, 2020) Oncology in the National | BCD 
Cancer Center Hospital TRN 
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Discussion and Analysis of various Machine Learning Algorithms 
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Fig. 1: Average accuracy for each ML technique Fig. 2: Maximum accuracy of a 
particular ML used to detect Breast Cancer technique in 
Breast Cancer detection 


From the above discussions we can clearly see that in the detection of Breast Cancer, maximum 
accuracy of 98.00% has been achieved by using Support Vector Machine (SVM) and Naive Bayes 
(NB) algorithms. On the other hand, Probabilistic Neural Network (PNN) has given the highest 
average accuracy (97.00%). 


Max Value 
120 
100 
80 
60 
40 E Max Value 
20 
0 
Fig. 3: Average accuracy for each ML technique used to Fig. 4: Maximum accuracy of a 
particular detect Lung Cancer ML technique in Lung Cancer 


detection 


From the above discussions we can clearly see that in the detection of Lung Cancer, maximum 
accuracy of 100.00% has been achieved by using Naive Bayes (NB) technique followed by Voting 
Classifier (VC) (99.50%) and Support Vector Machine (SVM) (99.20%) techniques. On the other 
hand, Convolutional Neural Network (CNN) has given the highest average accuracy (92.11%). 


Average accuracy 


Fig. 5: Average accuracy for each ML E used to Fig. 6: Maximum accuracy of a particular 
detect Liver Cancer ML technique in Liver Cancer detection 


Max Value 
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From the above discussions we can clearly see that in the detection of Liver Cancer, maximum 
accuracy of 100.00% has been achieved by using Neural Network (NN) technique followed by 
Multilayer Perceptron (MLP) (99.00%) and Support Vector Machine (SVM) (98.50%) techniques. 
On the other hand, Neural Network (NN) has given the highest average accuracy (99.29%). 


e 


Average accuracy Max Value 


| | | | | m Max Value 
à 


m SVM 120 
m ANN 100 


m KNN 80 
@CNN 60 
mRF 5 
= BCD 
20 
m TRN 
0 
s a 
ey e 


m FRCNN 


= 1V4 S Ss € eg & Ss 
Fig. 7: Average accuracy for each ML technique Fig. 8: Maximum accuracy of a particular 
used to detect Skin Cancer ML technique in Skin Cancer detection 


From the above discussions we can clearly see that in the detection of Skin Cancer, maximum 
accuracy of 97.50% has been achieved by using K- nearest Neighbour Network (KNN) technique 
followed by Artificial Neural Network (ANN) (95.00%) and Support Vector Machine (SVM) 
(93.60%) techniques. On the other hand, Artificial Neural Network (ANN) has given the highest 
average accuracy (85.95%). 


CONCLUSION 

The main aim of the system ‘Cancer Detection using Machine Learning’ is early detection of cancer. 
The wave is increasing day by day in the number of cancer diagnoses. As a result, better technology 
is needed to increase the chance of survival. In this review we have tried to explain, compare and 
assess the performance of different machine learning techniques that are being applied to cancer 
prediction and prognosis. We’ve studied about different types of cancer and also studied how 
accurately this machine learning technique calculates the outcomes. Compared to conventional 
statistical or expert-based systems it is clear that, machine learning methods generally improve the 
performance level or predictive accuracy of most prognoses. The enhancement of overall quality, 
generality and reproducibility of machine-based classifiers is based on improvements in experimental 
design along with improved biological validation. As long as the quality of studies will improve, the 
use of machine learning classifiers will increase in many clinical and hospital settings. 
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